Computer and Modernization ›› 2013, Vol. 1 ›› Issue (5): 22-27.doi: 10.3969/j.issn.1006-2475.2013.05.006
• 算法设计与分析 • Previous Articles Next Articles
JIN Jian, CHEN Qun, ZHAO Bao-xue
Received:
Revised:
Online:
Published:
Abstract: The study of join algorithm based on MapReduce is a hot topic in massive data research area. However, most current optimization work is based on the assumption that the data are evenly distributed. In practical applications, the data to be processed are often skew in distribution. This paper proposes a MapReduce join algorithm called Skew Control Join, which is adaptive for serious skew data. The algorithm gets the overall data distribution by sampling, then partitions the data by total partitioner to distribute the data evenly to all Reduce tasks. Experiment results show that the algorithm is of good performance when the processed data are skew.
Key words: join algorithm, data skew, total partition, sample
CLC Number:
TP301.6
JIN Jian;CHEN Qun;ZHAO Bao-xue. Research on Data Skew Join Algorithm Based on MapReduce Model[J]. Computer and Modernization, 2013, 1(5): 22-27.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.c-a-m.org.cn/EN/10.3969/j.issn.1006-2475.2013.05.006
http://www.c-a-m.org.cn/EN/Y2013/V1/I5/22